Skip to main content

Random Variables

Expected Value​

  • E[X+Y]=E[X]+E[Y]E[X + Y] = E[X] + E[Y]
  • E[aX+b]=aE[X]+bE[aX + b] = aE[X] + b
    Note how the Expected Value is shifted by amount bb
  • E[E[Y∣X]]=E[Y]E[E[Y|X]] = E[Y]
    This one's a doozy!

Note that when you're computing E[XY]E[XY], you're looking at Sum(xy.fX,Y(x,Y))Sum(xy.f_{X,Y}(x,Y)), ie. the joint PDF or PMF of Random Variables XX and YY. If these are independent

Variance​

  • Var(X)=E[(Xβˆ’E[X])2]=E[X2]βˆ’(E[X])2Var(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2
    The "spread" about the mean.
  • Var(aX+b)=a2Var(X)Var(aX + b) = a^2Var(X)
    Note that bb does not shift the Variance (like with Expected Value!)
  • Var(XΒ±Y)=Var(X)+Var(Y)Var(X \plusmn Y) = Var(X) + Var(Y)
    If XX and YY are independent. Always a plus-sign because βˆ’12-1^2
  • Var(XΒ±Y)=Var(X)+Var(Y)Β±2Cov(X,Y)Var(X \plusmn Y) = Var(X) + Var(Y) \plusmn 2Cov(X, Y)
    If they're dependent
  • Var(βˆ‘Xi)=βˆ‘Var(Xi)Var(\sum{X_i}) = \sum{Var(X_i)}
  • Var(βˆ‘aiXi+bi)=βˆ‘ai2Var(Xi)Var(\sum{a_iX_i + b_i}) = \sum{a_i^2 Var(X_i)}

Why Always Add Variances?​

We buy some cereal. The box says β€œ16 ounces.” We know that’s not precisely the weight of the cereal in the box, just close. After all, one corn flake more or less would change the weight ever so slightly. Weights of such boxes of cereal vary somewhat, and our uncertainty about the exact weight is expressed by the variance (or standard deviation) of those weights.

Next we get out a bowl that holds 3 ounces of cereal and pour it full. Our pouring skill is not very precise, so the bowl now contains about 3 ounces with some variability (uncertainty).

How much cereal is left in the box? Well, we assume about 13 ounces. But notice that we’re less certain about this remaining weight than we were about the weight before we poured out the bowlful. The variability of the weight in the box has increased even though we subtracted cereal.

Moral: Every time something happens at random, whether it adds to the pile or subtracts from it, uncertainty (read β€œvariance”) increases.

-- Source

Covariance​

  • Cov(X,X)=Var(X)Cov(X, X) = Var(X)
  • Cov(X,Y)=E[(Xβˆ’E[X])(Yβˆ’E[Y])]=E[XY]βˆ’E[X]E[Y]Cov(X, Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y]
    Just expand it out and you'll see.
  • Cov(X+a,Y+b)=Cov(X,Y)Cov(X + a, Y + b) = Cov(X, Y)
  • Cov(aX,bY)=abCov(X,Y)Cov(aX, bY) = abCov(X, Y)
  • Cov(aX+bY,cP+dQ)=acCov(X,Y)+adCov(X,Q)+...Cov(aX + bY, cP + dQ) = acCov(X, Y) + adCov(X, Q) + ...

Correlation​

Between two variables XX and YY, it's the Covariance of (X,Y)(X,Y) normalized by their variance1.

Corr(X,Y)=ρ=Cov(X,Y)Var(X)Var(Y),βˆ’1≀ρ≀1Corr(X, Y) = \rho = \frac{Cov(X, Y)}{\sqrt{Var(X)Var(Y)}}, -1 \leq \rho \leq 1

It measures the linearity of the relationship between Random Variables (or Vectors!)

ρ(X,Y)=1β€…β€ŠβŸΉβ€…β€ŠY=aX+B,a=Οƒy/Οƒxρ(X,Y)=βˆ’1β€…β€ŠβŸΉβ€…β€ŠY=aX+B,a=βˆ’Οƒy/Οƒxρ(X,Y)=0β€…β€ŠβŸΉβ€…β€ŠNoΒ linearity\rho(X,Y) = 1 \implies Y = aX+B, a = \sigma_y/\sigma_x \rho(X,Y) = -1 \implies Y = aX+B, a = -\sigma_y/\sigma_x \rho(X,Y) = 0 \implies \text{No linearity}
warning

A correlation of zero does not necessarily mean that XX and YY are independent!

Disjointedness and Independence​

Disjoint Events​

Putting all that stuff up there together. Disjointedness is also called "Mutual Exclusivity": events have nothing in common in the Sample and Event Spaces. If one happens, the other cannot. In this case, they do not have a joint Probability Distribution at all. So,

E[XY]=0Cov(X,Y)=E[XY]βˆ’E[X]E[Y]=βˆ’E[X]E[Y]E[XY] = 0 Cov(X,Y) = E[XY] - E[X]E[Y] = -E[X]E[Y]

Now you can look at the PDFs or PMFs of XX and YY and expand out that equation2. But what it's saying with that negative sign is: if XX happens, YY cannot.

Independent Events​

For two independent events XX and YY,

E[XY]=E[X]E[Y]β€…β€ŠβŸΉβ€…β€ŠCov(X,Y)=E[XY]βˆ’E[X]E[Y]=E[X]E[Y]βˆ’E[X]E[Y]=0E[XY] = E[X]E[Y] \implies Cov(X,Y) = E[XY] - E[X]E[Y] = E[X]E[Y] - E[X]E[Y] = 0

Summary​

PropertyDisjoint (AKA Mutually Exclusive)Independent
Probability - IntersectionP(X∩Y)=0P(X \cap Y) = 0P(X∩Y)=P(X)P(Y)P(X \cap Y) = P(X)P(Y)
Probability - ConditionalIf both have positive probability: P(X∣Y)=0P(X \mid Y) = 0, P(Y∣X)=0P(Y \mid X) = 0 (knowing one excludes the other)P(X∣Y)=P(X)P(X \mid Y) = P(X), P(Y∣X)=P(Y)P(Y \mid X) = P(Y) (knowing one gives no info)
Expectation - AdditivityE[X+Y]=E[X]+E[Y]\mathbb{E}[X+Y] = \mathbb{E}[X] + \mathbb{E}[Y] (always true)Same (always true)
Expectation - ProductE[XY]=0\mathbb{E}[XY] = 0E[XY]=E[X]E[Y]\mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]
Variance of SumVar(XΒ±Y)=Var(X)+Var(Y)Β±2Cov(X,Y)\mathrm{Var}(X \plusmn Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) \plusmn 2Cov(X,Y)Var(X+Y)=Var(X)+Var(Y)\mathrm{Var}(X+Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)
CovarianceCov(X,Y)=βˆ’E[X]E[Y]\mathrm{Cov}(X,Y) = -\mathbb{E}[X]\mathbb{E}[Y]Cov(X,Y)=0\mathrm{Cov}(X,Y) = 0
Independence implied?❌ Never (unless trivial zero-probability case)βœ… By definition
Disjointness implied?βœ… By definition❌ Never (unless trivial zero-probability case)

Higher Order Moments​

OrderName
FirstMeanβˆ‘xn\frac{\sum{x}}{n}
SecondVarianceβˆ‘x2nβ†’βˆ‘(xβˆ’ΞΌ)2n\frac{\sum{x^2}}{n} \rarr \frac{\sum{(x - \mu)^2}}{n}
ThirdSkewnessβˆ‘x3nβ†’βˆ‘(xβˆ’ΞΌ)3nβ†’1nβˆ‘(xβˆ’ΞΌ)3Οƒ3\frac{\sum{x^3}}{n} \rarr \frac{\sum{(x - \mu)^3}}{n} \rarr \frac{1}{n}\frac{\sum{(x - \mu)^3}}{\sigma^3}
FourthKurtosisβˆ‘x4nβ†’βˆ‘(xβˆ’ΞΌ)4nβ†’1nβˆ‘(xβˆ’ΞΌ)4Οƒ4\frac{\sum{x^4}}{n} \rarr \frac{\sum{(x - \mu)^4}}{n} \rarr \frac{1}{n}\frac{\sum{(x - \mu)^4}}{\sigma^4}

Footnotes​

  1. This means we're making the result unitless and clamping it to between [0,1][0,1] ↩

  2. One trick for demonstration purposes is to set the Proability Function to 1A1_A. This means the "Indicator Random Variable" which means X=1X = 1 if AA happens else zero if AA doesn't. So E[X]E[X] just becomes the probability of AA. ↩